Search CORE

304 research outputs found

Optimizing MapReduce for Multicore Architectures

Author: Kaashoek Frans
Mao Yandong
Morris Robert
Publication venue
Publication date: 01/01/2010
Field of study

MapReduce is a programming model for data-parallel programs originally intended for data centers. MapReduce simplifies parallel programming, hiding synchronization and task management. These properties make it a promising programming model for future processors with many cores, and existing MapReduce libraries such as Phoenix have demonstrated that applications written with MapReduce perform competitively with those written with Pthreads. This paper explores the design of the MapReduce data structures for grouping intermediate key/value pairs, which is often a performance bottleneck on multicore processors. The paper finds the best choice depends on workload characteristics, such as the number of keys used by the application, the degree of repetition of keys, etc. This paper also introduces a new MapReduce library, Metis, with a compromise data structure designed to perform well for most workloads. Experiments with the Phoenix benchmarks on a 16-core AMD-based servershow that Metisâ data structure performs better than simpler alternatives, including Phoenix

CiteSeerX

DSpace@MIT

Whanaungatanga: Sybil-proof routing with social networks

Author: Kaashoek M. Frans
Lesniewski-Laas Chris
Publication venue
Publication date: 24/09/2009
Field of study

Decentralized systems, such as distributed hash tables, are subject to the Sybil attack, in which an adversary creates many false identities to increase its influence. This paper proposes a routing protocol for a distributed hash table that is strongly resistant to the Sybil attack. This is the first solution to this problem with sublinear run time and space usage. The protocol uses the social connections between users to build routing tables that enable Sybil-resistant distributed hash table lookups. With a social network of N well-connected honest nodes, the protocol can tolerate up to O(N/log N) "attack edges" (social links from honest users to phony identities). This means that an adversary has to fool a large fraction of the honest users before any lookups will fail. The protocol builds routing tables that contain O(N log^(3/2) N) entries per node. Lookups take O(1) time. Simulation results, using social network graphs from LiveJournal, Flickr, and YouTube, confirm the analytical results

DSpace@MIT

Retroactive auditing

Author: Kaashoek M. Frans
Wang Xi
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Retroactive auditing is a new approach for detecting past intrusions and vulnerability exploits based on security patches. It works by spawning two copies of the code that was patched, one with and one without the patch, and running both of them on the same inputs observed during the system's original execution. If the resulting outputs differ, an alarm is raised, since the input may have triggered the patched vulnerability. Unlike prior tools, retroactive auditing does not require developers to write predicates for each vulnerability.United States. Defense Advanced Research Projects Agency. Clean-slate design of Resilient, Adaptive, Secure Hosts (Contract number N66001-10-2-4089)National Natural Science Foundation (CNS-1053143

CiteSeerX

DSpace@MIT

Crossref

CPHASH: A cache-partitioned hash table

Author: Kaashoek M. Frans
Metreveli Zviad
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/11/2011
Field of study

CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores.Quanta Computer (Firm)National Science Foundation (U.S.). (Award 915164

CiteSeerX

DSpace@MIT

Crossref

User-Relative Names for Globally Connected Personal Devices

Author: Ford Bryan
Kaashoek Frans
Lesniewski-Laas Chris
Morris Robert
Rhea Sean
Strauss Jacob
Publication venue
Publication date: 01/01/2006
Field of study

Nontechnical users who own increasingly ubiquitous network-enabled personal devices such as laptops, digital cameras, and smart phones need a simple, intuitive, and secure way to share information and services between their devices. User Information Architecture, or UIA, is a novel naming and peer-to-peer connectivity architecture addressing this need. Users assign UIA names by "introducing" devices to each other on a common local-area network, but these names remain securely bound to their target as devices migrate. Multiple devices owned by the same user, once introduced, automatically merge their namespaces to form a distributed "personal cluster" that the owner can access or modify from any of his devices. Instead of requiring users to allocate globally unique names from a central authority, UIA enables users to assign their own "user-relative" names both to their own devices and to other users. With UIA, for example, Alice can always access her iPod from any of her own personal devices at any location via the name "ipod", and her friend Bob can access her iPod via a relative name like "ipod.Alice".Comment: 7 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Resilient overlay networks

Author: David Andersen
Frans Kaashoek
Hari Balakrishnan
Robert Morris
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Improving application security with data flow assertions

Author: Kaashoek M. Frans
Wang Xi
Yip Alexander
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Resin is a new language runtime that helps prevent security vulnerabilities, by allowing programmers to specify application-level data flow assertions. Resin provides policy objects, which programmers use to specify assertion code and metadata; data tracking, which allows programmers to associate assertions with application data, and to keep track of assertions as the data flow through the application; and filter objects, which programmers use to define data flow boundaries at which assertions are checked. Resin's runtime checks data flow assertions by propagating policy objects along with data, as that data moves through the application, and then invoking filter objects when data crosses a data flow boundary, such as when writing data to the network or a file. Using Resin, Web application programmers can prevent a range of problems, from SQL injection and cross-site scripting, to inadvertent password disclosure and missing access control checks. Adding a Resin assertion to an application requires few changes to the existing application code, and an assertion can reuse existing code and data structures. For instance, 23 lines of code detect and prevent three previously-unknown missing access control vulnerabilities in phpBB, a popular Web forum application. Other assertions comprising tens of lines of code prevent a range of vulnerabilities in Python and PHP applications. A prototype of Resin incurs a 33% CPU overhead running the HotCRP conference management application.Nokia Researc

CiteSeerX

DSpace@MIT

Crossref

Operating system extensibility through event capture

Author: M. Frans Kaashoek
Thomas Pinckney III
Thomas Pinckney Iii
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 31).by Thomas Pinckney III.M.Eng

CiteSeerX

DSpace@MIT

Reinventing Scheduling for Multicore Systems

Author: Boyd-Wickizer Silas
Kaashoek M. Frans
Morris Robert Tappan
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2009
Field of study

High performance on multicore processors requires that schedulers be reinvented. Traditional schedulers focus on keeping execution units busy by assigning each core a thread to run. Schedulers ought to focus, however, on high utilization of on-chip memory, rather than of execution cores, to reduce the impact of expensive DRAM and remote cache accesses. A challenge in achieving good use of on-chip memory is that the memory is split up among the cores in the form of many small caches. This paper argues for a form of scheduling that assigns each object and its operations to a specific core, moving a thread among the cores as it uses different objects

CiteSeerX

DSpace@MIT